智能论文笔记

Creating a Safety Assurance Case for an ML Satellite-Based Wildfire Detection and Alert System

Richard Hawkins , Chiara Picardi , Lucy Donnell , Murray Ireland

分类：机器学习

2022-11-08

Wildfires are a common problem in many areas of the world with often catastrophic consequences. A number of systems have been created to provide early warnings of wildfires, including those that use satellite data to detect fires. The increased availability of small satellites, such as CubeSats, allows the wildfire detection response time to be reduced by deploying constellations of multiple satellites over regions of interest. By using machine learned components on-board the satellites, constraints which limit the amount of data that can be processed and sent back to ground stations can be overcome. There are hazards associated with wildfire alert systems, such as failing to detect the presence of a wildfire, or detecting a wildfire in the incorrect location. It is therefore necessary to be able to create a safety assurance case for the wildfire alert ML component that demonstrates it is sufficiently safe for use. This paper describes in detail how a safety assurance case for an ML wildfire alert system is created. This represents the first fully developed safety case for an ML component containing explicit argument and evidence as to the safety of the machine learning.

translated by 谷歌翻译

Deep Learning for Space Weather Prediction: Bridging the Gap between Heliophysics Data and Theory

John C. Dorelli , Chris Bard , Thomas Y. Chen , Daniel Da Silva , Luiz Fernando Guides dos Santos , Jack Ireland , Michael Kirk , Ryan McGranaghan , Ayris Narock , Teresa Nieves-Chinchilla

分类：机器学习

2022-12-27

Traditionally, data analysis and theory have been viewed as separate disciplines, each feeding into fundamentally different types of models. Modern deep learning technology is beginning to unify these two disciplines and will produce a new class of predictively powerful space weather models that combine the physical insights gained by data and theory. We call on NASA to invest in the research and infrastructure necessary for the heliophysics' community to take advantage of these advances.

translated by 谷歌翻译

Heliophysics Discovery Tools for the 21st Century: Data Science and Machine Learning Structures and Recommendations for 2020-2050

R. M. McGranaghan , B. Thompson , E. Camporeale , J. Bortnik , M. Bobra , G. Lapenta , S. Wing , B. Poduval , S. Lotz , S. Murray

分类：人工智能 | 机器学习

2022-12-26

Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires a new approach to the representation of knowledge.

translated by 谷歌翻译

ImPaKT: A Dataset for Open-Schema Knowledge Base Construction

Luke Vilnis , Zach Fisher , Bhargav Kanagal , Patrick Murray , Sumit Sanghai

分类：自然语言处理

2022-12-21

Large language models have ushered in a golden age of semantic parsing. The seq2seq paradigm allows for open-schema and abstractive attribute and relation extraction given only small amounts of finetuning data. Language model pretraining has simultaneously enabled great strides in natural language inference, reasoning about entailment and implication in free text. These advances motivate us to construct ImPaKT, a dataset for open-schema information extraction, consisting of around 2500 text snippets from the C4 corpus, in the shopping domain (product buying guides), professionally annotated with extracted attributes, types, attribute summaries (attribute schema discovery from idiosyncratic text), many-to-one relations between compound and atomic attributes, and implication relations. We release this data in hope that it will be useful in fine tuning semantic parsers for information extraction and knowledge base construction across a variety of domains. We evaluate the power of this approach by fine-tuning the open source UL2 language model on a subset of the dataset, extracting a set of implication relations from a corpus of product buying guides, and conducting human evaluations of the resulting predictions.

translated by 谷歌翻译

Talking About Large Language Models

Murray Shanahan

分类：自然语言处理 | 机器学习

2022-12-07

Thanks to rapid progress in artificial intelligence, we have entered an era when technology and philosophy intersect in interesting ways. Sitting squarely at the centre of this intersection are large language models (LLMs). The more adept LLMs become at mimicking human language, the more vulnerable we become to anthropomorphism, to seeing the systems in which they are embedded as more human-like than they really are. This trend is amplified by the natural tendency to use philosophically loaded terms, such as "knows", "believes", and "thinks", when describing these systems. To mitigate this trend, this paper advocates the practice of repeatedly stepping back to remind ourselves of how LLMs, and the systems of which they form a part, actually work. The hope is that increased scientific precision will encourage more philosophical nuance in the discourse around artificial intelligence, both within the field and in the public sphere.

translated by 谷歌翻译

Exploration of Convolutional Neural Network Architectures for Large Region Map Automation

R. M. Tsenov , C. J. Henry , J. L. Storie , C. D. Storie , B. Murray , M. Sokolov

分类：计算机视觉 | 机器学习

2022-11-07

Deep learning semantic segmentation algorithms have provided improved frameworks for the automated production of Land-Use and Land-Cover (LULC) maps, which significantly increases the frequency of map generation as well as consistency of production quality. In this research, a total of 28 different model variations were examined to improve the accuracy of LULC maps. The experiments were carried out using Landsat 5/7 or Landsat 8 satellite images with the North American Land Change Monitoring System labels. The performance of various CNNs and extension combinations were assessed, where VGGNet with an output stride of 4, and modified U-Net architecture provided the best results. Additional expanded analysis of the generated LULC maps was also provided. Using a deep neural network, this work achieved 92.4% accuracy for 13 LULC classes within southern Manitoba representing a 15.8% improvement over published results for the NALCMS. Based on the large regions of interest, higher radiometric resolution of Landsat 8 data resulted in better overall accuracies (88.04%) compare to Landsat 5/7 (80.66%) for 16 LULC classes. This represents an 11.44% and 4.06% increase in overall accuracy compared to previously published NALCMS results, including larger land area and higher number of LULC classes incorporated into the models compared to other published LULC map automation methods.

translated by 谷歌翻译

NVRadarNet: Real-Time Radar Obstacle and Free Space Detection for Autonomous Driving

Alexander Popov , Patrik Gebhardt , Ke Chen , Ryan Oldja , Heeseok Lee , Shane Murray , Ruchi Bhargava , Nikolai Smolyanskiy

分类：计算机视觉 | 机器学习 | 机器人

2022-09-29

检测障碍对于安全有效的自动驾驶至关重要。为此，我们提出了NVRadarnet，这是一种深神经网络（DNN），它使用汽车雷达传感器检测动态障碍物和可驱动的自由空间。该网络利用从多个雷达传感器的时间积累的数据来检测动态障碍，并在自上而下的鸟类视图（BEV）中计算其方向。该网络还可以回归可驱动的自由空间，以检测未分类的障碍。我们的DNN是第一个使用稀疏雷达信号的同类DNN，以实时从雷达数据实时执行障碍物和自由空间检测。在实际的自动驾驶场景中，该网络已成功地用于我们的自动驾驶汽车。该网络在嵌入式GPU上的运行速度快于实时时间，并且在地理区域显示出良好的概括。

translated by 谷歌翻译

Efficient Approximate Kernel Based Spike Sequence Classification

Sarwan Ali , Bikram Sahoo , Muhammad Asad Khan , Alexander Zelikovsky , Imdad Ullah Khan , Murray Patterson

分类：机器学习

2022-09-11

机器学习（ML）模型，例如SVM，用于分类和序列的聚类等任务，需要定义序列对之间的距离/相似性。已经提出了几种方法来计算序列之间的相似性，例如确切的方法计算$ k $ -s-mers（长度$ k $的子序列）之间的匹配数和估计成对相似性得分的近似方法。尽管精确的方法产生了更好的分类性能，但它们的计算成本很高，将其适用性限制在少量序列中。事实证明，近似算法更可扩展，并具有相当的性能（有时更好）确切方法 - 它们以“一般”方式设计用于处理不同类型的序列（例如音乐，蛋白质等）。尽管一般适用性是算法的所需属性，但在所有情况下都不是这种情况。例如，在当前的Covid-19（冠状病毒）大流行中，需要一种可以专门处理冠状病毒的方法。为此，我们提出了一系列方法来提高近似内核的性能（使用最小化和信息增益），以增强其预测性能PM冠状病毒序列。更具体地说，我们使用域知识（使用信息增益计算）和有效的预处理（使用最小值计算）来提高近似内核的质量，以对与不同变体相对应的冠状病毒峰值蛋白序列进行分类（例如，Alpha，Beta，Beta，Gamma）。我们使用不同的分类和聚类算法报告结果，并使用多个评估指标评估其性能。使用两个数据集，我们表明我们提出的方法有助于与医疗保健领域的基线和最先进的方法相比，有助于提高内核的性能。

translated by 谷歌翻译

Gradient Descent Temporal Difference-difference Learning

Rong J. B. Zhu , James M. Murray

分类：机器学习

2022-09-10

事实证明，行为政策与目标政策不同并用于获得学习经验的政策策略在强化学习中具有巨大的实践价值。但是，即使对于简单的凸问题，例如线性值函数近似，这些算法也不能保证是稳定的。为了解决这个问题，在这种情况下引入了可证明会收敛的替代算法，最著名的是梯度下降时间差异（GTD）学习。然而，这种算法和其他类似的算法往往比传统的时间差异学习更慢得多。在本文中，我们建议通过在连续参数更新中引入二阶差异来提高GTD2的梯度下降时间差异（梯度DD）学习。我们在线性值函数近似的框架中研究了该算法，理论上通过应用随机近似理论来证明其收敛性。分析显示其比GTD2的改善。通过经验研究该模型的随机步行任务，Boyan-Chain任务和Baird的非政策反例，我们发现对GTD2的实质性改善，在某些情况下，甚至比传统的TD学习更好的表现甚至更好。

translated by 谷歌翻译

Rates of Convergence for Regression with the Graph Poly-Laplacian

Nicolás García Trillos , Ryan Murray , Matthew Thorpe

分类： (统计)机器学习 | 机器学习

2022-09-06

在（特殊的）平滑样条问题中，一个人考虑了二次数据保真惩罚和拉普拉斯正则化的变异问题。可以通过用聚拉普拉斯的正规机构代替拉普拉斯的常规机构来获得较高的规律性。该方法很容易适应图，在这里，我们考虑在完全监督的，非参数，噪声损坏的回归问题中图形多拉普拉斯正则化。特别是，给定一个数据集$ \ {x_i \} _ {i = 1}^n $和一组嘈杂的标签$ \ {y_i \} _ {i = 1}^n \ subset \ subset \ mathbb {r}令$ u_n：\ {x_i \} _ {i = 1}^n \ to \ mathbb {r} $是由数据保真项组成的能量的最小化器，由数据保真术语和适当缩放的图形poly-laplacian项组成。当$ y_i = g（x_i）+\ xi_i $，对于IID噪声$ \ xi_i $，并使用几何随机图，我们在大型中识别（高概率）$ u_n $ to $ g $的收敛速率数据限制$ n \ to \ infty $。此外，我们的速率（到对数）与通常的平滑样条模型中已知的收敛速率相吻合。

translated by 谷歌翻译